Intro

Central Texas was hit with downpours over the Fourth of July weekend that lead to flash floods. The Guadalupe River near Kerrville went from under 2 feet to over 34 in just over an hour. The death toll is at least 90 as of July 7, most of them in Kerr County.

This is a data analysis of: flash flood fatalities from NOAA and stream gage data from USGS.

Here’s what we found: - NOAA storm event data goes back to 1950, but the first flash flood recorded there is in 1996. Between 1996 and 2024 (latest full year), there have been 1,923 direct deaths and 6,508 direct injuries from flash floods. - Texas is the state with the most flash flood direct deaths over that entire time period with 385, followed by North Carolina (127), Missouri (107), Arizona (104), and Kentucky (99). - The year with the most flash flood deaths in Texas (between 1996-2024) is 2017 with 68 direct deaths. That would make 2025’s event the deadliest year in recent decades. - Since October 1997, as far as records in this place go, the Guadalupe River at Kerrville, TX reached minor flooding 4 days, moderate flooding 6 days, and major flooding 2 days. The highest river height recorded there is 34.29 feet, on July 4, 2025. - By the start of July 4, 2025, the discharge at the Guadalupe River at Kerrville was 3 cubic feet per second (cfs). That rate would fill an olympic pool in 8.1 hours. Soon after sunrise (7:30 am), it reached 134,000 cfs. That rate would fill the same pool in 0.66 seconds (88,229 ft3 divided by 134,000 cfs). - That peak discharge was the second highest ever recorded by this stream monitor, with data going back to mid 1986. It’s worth noting that there is a gap between 6:15 am and 7:30 am before discharge starts to decrease on July 4.

Read below to see how we got these numbers.

Methods

We downloaded NOAA storm data through their bulk data download, getting annual files from 1950 through 2024 (StormEvents_details-ftp_v1.0).

Start by listing all the annual files and generating paths to import them.

dir <- "inputs/noaa/storm-events/"
base <- "StormEvents_details-ftp_v1.0_d"
seq <- 1950:2024
end <- "_c20250520.csv"

file_names <- data.frame(year = seq, 
                         file = paste0(dir, base, seq, end)) %>%
  mutate(file = ifelse(year == 2020, "inputs/noaa/storm-events/StormEvents_details-ftp_v1.0_d2020_c20250702.csv", file))

rm(dir, base, seq, end)

Now make a function to import and filter events for flash floods. Iterate it over all the files.

import_storms_fc <- function(path){
  
  x <- read_csv(path, guess_max = 100000) %>% clean_names()
  
  # filter flash floods
  # keep relevant columns
  x <- x %>%
    mutate(state = str_to_title(state)) %>%
    filter(event_type == "Flash Flood") %>%
    select(state, year, injuries_direct, injuries_indirect, deaths_direct, deaths_indirect,
           flood_cause)
  
  return(x)
  
}

flash_floods <- lapply(file_names$file, import_storms_fc) %>% rbindlist()

Looks like the earliest record of a flash flood in this data is in 1996. Since then, there’s been 1,923 direct deaths and 6,508 direct injuries.

Let’s aggregate this annual data by state.

# between 1996 and 2024
sum(flash_floods$injuries_direct)
## [1] 6508
sum(flash_floods$injuries_indirect)
## [1] 69
sum(flash_floods$deaths_direct)
## [1] 1923
sum(flash_floods$deaths_indirect)
## [1] 65
# group by state and remove territories
flash_floods_state <- flash_floods %>%
  group_by(state) %>%
  summarise(injuries_direct = sum(injuries_direct, na.rm = T),
            deaths_direct = sum(deaths_direct, na.rm = T)) %>%
  filter(!state %in% c("American Samoa", "Puerto Rico", "Virgin Islands", "Guam"))

# quick chart of top 10 states
flash_floods_state %>% 
  arrange(desc(deaths_direct)) %>%
  head(10) %>%
  mutate(state = factor(state, levels = rev(unique(state)))) %>%
  ggplot(aes(x = deaths_direct, y = state)) +
  geom_bar(stat = "identity", fill = "#1665CF") +
  theme_linedraw() +
  labs(title = "Top 10 States by Flash Flood Deaths (1996-2024)",
     x = "Direct Deaths",
     y = "")

Texas is the state with the most flash floods deaths. Let’s take a closer look at annual numbers. Between 1996 and 2024, the year with the most deaths is 2017 at 68.

# get an annual timeseries for texas
flash_floods_tx <- flash_floods %>%
  filter(state == "Texas") %>%
  group_by(year) %>%
  summarise(injuries_direct = sum(injuries_direct, na.rm = T),
            deaths_direct = sum(deaths_direct, na.rm = T))

# quick chart
flash_floods_tx %>%
  ggplot(aes(x = year, y = deaths_direct)) +
  geom_bar(stat = "identity", fill = "#1665CF") +
  theme_linedraw() +
  labs(title = "Texas Annual Flash Flood Deaths (1996-2024)",
     y = "Direct Deaths",
     x = "")

Here is the tables for state data (combined years).

And the table for Texas annual data (1996-2024).

Now, let’s move to stream gage data from USGS. We’ll be accessing the data through the dataRetrieval package from the agency, but the same can be found on their website.

We’ll start with site 08166200 for Guadalupe River at Kerrville, TX. Gage height continuous data (15-min intervals) is available since mid 2007. Gage height daily data which includes mean/max/min aggregations is available since mid 1997.

Query all daily gage height (parameter 00065) and then label, based on max height, the category for each day: - action, “The level which, when reached by a rising stream, represents the level where the NWS or a customer/partner needs to take some type of mitigation action in preparation for possible significant hydrologic activity.” - minor flooding, “Minimal or no property damage, but possibly some public threat.” - moderate flooding, “Some inundation of structures and roads near stream.” - major flooding, “Extensive inundation of structures and roads. Significant evacuations of people and/or transfer of property to higher elevations.”

These thresholds are set by the local NWS office. Corresponding values for our site can be found here.

kerrville <- readNWISdv(siteNumbers = "08166200",
                        parameterCd = "00065",
                        startDate = "1997-01-01",
                        endDate = "2025-07-07",
                        statCd = c("00001", "00003")) # 00001 = Max, 00003 = Mean

# rename columns for clarity
kerrville <- kerrville %>%
  clean_names() %>%
  select(site_no, date, stage_height_max = x_00065_00001, stage_height_mean = x_00065_00003)

# label flooding categories based on daily max
kerrville <- kerrville %>%
  clean_names() %>%
  mutate(category = case_when(
    stage_height_max >= 7 & stage_height_max < 9 ~"Action",
    stage_height_max >= 9 & stage_height_max < 12 ~ "Minor flooding",
    stage_height_max >= 12 & stage_height_max < 20 ~ "Moderate flooding",
    stage_height_max >= 20 ~ "Major flooding"))

Since October 1997, as far as records in this place go, the Guadalupe River at Kerrville reached minor flooding 4 days, moderate flooding 6 days, and major flooding 2 days.

The highest river height recorded there is 34.29 feet, on July 4, 2025.

# get a quick count
kerrville %>%
  group_by(category) %>%
  summarise(count = n())
## # A tibble: 5 × 2
##   category          count
##   <chr>             <int>
## 1 Action                8
## 2 Major flooding        2
## 3 Minor flooding        4
## 4 Moderate flooding     6
## 5 <NA>               9954
# table with full data
kerrville %>%
  arrange(-stage_height_max) %>% # can switch to arrange(desc(date))
  datatable(extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf')))

Double check July 4 was the highest day by taking a look at the continuous data (15-min intervals) for the site. July 4, 2025 at 11:45 am is indeed the highest recorded in this data.

# this gets subdaily data
kerrville_uv <- readNWISuv(siteNumbers = "08166200",
                        parameterCd = "00065",
                        startDate = "2007-01-01",
                        endDate = "2025-07-07")

# check max height recorded
kerrville_uv %>%
  arrange(-X_00065_00000) %>%
  head(1)
##   agency_cd  site_no            dateTime X_00065_00000 X_00065_00000_cd tz_cd
## 1      USGS 08166200 2025-07-04 11:45:00         34.29                P   UTC

We can also look at the discharge (streamflow) data at this site. Continuous data (15-min intervals) at the Guadalupe River at Kerrville goes back to mid 1996. Daily discharge data which includes mean/max/min aggregations is available since mid 1986.

Query both as far back as they go.

# daily
kerrville_Q <- readNWISdv(siteNumbers = "08166200",
                        parameterCd = "00060",
                        startDate = "1986-01-01",
                        endDate = "2025-07-08",
                        statCd = c("00001", "00003")) # 00001 = Max, 00003 = Mean

# continuous
kerrville_uv_Q <- readNWISuv(siteNumbers = "08166200",
                        parameterCd = "00060",
                        startDate = "1996-01-01",
                        endDate = "2025-07-08")

# rename columns for clarity
kerrville_Q <- kerrville_Q %>%
  clean_names() %>%
  select(site_no, date, q_max = x_00060_00001, q_mean = x_00060_00003)

# and change UTC to central time zone
kerrville_uv_Q <- kerrville_uv_Q %>%
  clean_names() %>%
  select(site_no, date_time, q_cfs = x_00060_00000) %>%
  mutate(date_time = as.POSIXct(date_time, tz = "America/Chicago"))

By the start of July 4, 2025, the discharge at the Guadalupe River at Kerrville was 3.04 cubic feet per second (cfs). That rate would fill an olympic pool (660,000 gallons or 88,229 ft3) in 8.1 hours (88,229 ft3 divided by 3.04 cfs, then divided by 3600 to get rate in hours).

Soon after sunrise (7:30 am), it reached 134,000 cfs. That rate would fill the same pool in 0.66 seconds (88,229 ft3 divided by 134,000 cfs).

That peak discharge was the second highest ever recorded by this stream monitor, with data going back to mid 1986. It’s worth noting that there is a gap in the data between 6:15 am and 7:30 am before discharge starts to decrease on July 4. That means the monitor may have missed even higher values, something that happens during extreme events.

Here is the daily data showing the max discharge back until 1986:

kerrville_Q %>%
  arrange(-q_max) %>%
  select(site_no, date, max_discharge_cfs = q_max, mean_discharge_cfs = q_mean) %>%
  datatable(extensions = 'Buttons', options = list(
    dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf')))